Cross-Language Information Retrieval using Dutch Query Translation
نویسندگان
چکیده
This paper describes an elementary bilingual information retrieval experiment. The experiment takes Dutch topics to retrieve relevant English documents using Microsoft SQL Server version 7.0. In order to cross the language barrier between query and document, the researchers use query translation by means of a machine-readable dictionary. The Dutch run was void of the typical natural language processing techniques such as parsing, stemming, or part of speech tagging. A monolingual run was carried out for comparison purposes. Due to limitations in time, retrieval system, translation method, and test collection, there is only a preliminary analysis of the results. Introduction and problem description Cross-Language Information Retrieval (CLIR) systems enable users to formulate queries in their native language to retrieve documents in foreign languages [1]. In CLIR, retrieval is not restricted to the query language. Rather queries in one language are used to retrieve documents in multiple languages. Because queries and documents in CLIR do not necessarily share the same language, translation is needed before matching can take place. This translation step tends to cause a reduction in crosslanguage retrieval performance as compared to monolingual information retrieval. The literature explores four different translation options: translating queries (e.g. [2], [3]), translating documents [4], [5], translating both queries and documents [6], and cognate matching 1 [7]. The prevailing CLIR approach is query translation. The translation of queries is inherently difficult due to the lack of a one-to-one mapping of a lexical item and its meaning. This creates lexical ambiguity. Further, query translation is complicated by the cultural differences between language communities and the way they lexicalize the world around them. These two translation issues create many different translation problems such as lexical ambiguity, lexical mismatches, and lexical holes. In turn, these and other translation problems result in translation errors which impact CLIR retrieval performance. The Cross-Language Evaluation Forum (CLEF) provides a multilingual test collection to study CLIR using European languages. One of the CLEF tasks is bilingual information retrieval. The aim of the bilingual task is the retrieval of documents in a language different from the topic (query) language. Unlike the multilingual task, only two languages are involved and retrieval results are monolingual. For the bilingual run we used the Dutch topic set (40 topics) to retrieve English documents (Los Angeles Times of 1994 – 113,005 documents, 409,600 KB). We were completely oblivious to CLEF and its deadlines but we happened to hear that CLEF results were due in one week. We immediately signed up and started on our mad rush to get results in on time.
منابع مشابه
Query Translation for Cross-lingual Information Retrieval using Wikipedia
In this paper the system WikiTranslate is introduced that performs query translation for cross-lingual information retrieval (CLIR) that only uses Wikipedia. Queries will be mapped to Wikipedia concepts and the corresponding translations of these concepts in the target language are used to create the final query. WikiTranslate is evaluated by searching with topics in Dutch, French and Spanish i...
متن کاملEnglish-Dutch CLIR Using Query Translation Techniques
We present a report on our participation in the English-Dutch bilingual task of the 2001 Cross-Language Evaluation Forum (CLEF). We attempted to demonstrate that good cross language query translation results can be obtained by combining a dictionary based and parallel corpus based techniques. A parallel corpus based technique was used to choose the best sense from all possible senses found in t...
متن کاملWhen to Cross Over? Cross-Language Linking Using Wikipedia for VideoCLEF 2009
We describe Dublin City University (DCU)’s participation in the VideoCLEF 2009 Linking Task. Two approaches were implemented using the Lemur information retrieval toolkit. Both approaches first extracted a search query from the transcriptions of the Dutch TV broadcasts. One method first performed search on a Dutch Wikipedia archive, then followed links to corresponding pages in the English Wiki...
متن کاملTranslation Events in Dutch Cross-Language Information Retrieval
The paper describes an analysis of the translation events encountered when queries cross the language barrier in crosslanguage information retrieval. A study of a set of query source and target triples resulted in the creation of a translation taxonomy. The taxonomy was used to code 750 English target queries. The 750 coded queries are currently being used in retrieval experiments to assess the...
متن کاملWikiTranslate: Query Translation for Cross-lingual Information Retrieval using only Wikipedia
This paper presents WikiTranslate, a system which performs query translation for cross-lingual information retrieval (CLIR) using only Wikipedia to obtain translations. Queries are mapped to Wikipedia concepts and the corresponding translations of these concepts in the target language are used to create the final query. WikiTranslate is evaluated by searching with topics formulated in Dutch, Fr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000